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ABSTRACT 

The SSU rRNA gene is one of the most widely utilized loci for phylogenetic inference 
among eukaryotic organisms. Although they have an average length of 1800 to 1900 bp, 
several unusually large 18S rDNA sequences have been reported. After examining GenBank 
sequences and 180 new 18S rRNA sequences from several metazoan groups, we report many 
other extraordinary sequences ranging between ca. 1350 bp (in symphylan myriapods) to ca. 
3300 bp (in some strepsipteran insects). Myriapods are particularly interesting, having inde¬ 
pendently evolved extraordinary sequences in the four classes (Chilopoda, Diplopoda, Sym- 
phyla, and Pauropoda). An insertion event of ca. 300 bp has been detected in all but the most 
basal family of geophilomorphan centipedes. Other major insertions are also found in other 
arthropod groups, in onychophorans, molluscs, chaetognaths, echinoderms, and parasitic platy- 
helminths. The use of information derived from secondary structure predictions combined with 
a new method to analyze DNA sequence data without multiple sequence alignments is pro¬ 
posed as a solution for analyzing sequence data that possess alternatively conservative and 
variable regions, such as ribosomal genes. 


INTRODUCTION 

The small-subunit ribosomal RNA gene is 
one of the most widely utilized loci in phy¬ 
logenetic inference among eukaryotic organ¬ 
isms (e.g., van de Peer and De Wachter, 
1997) especially for the examination of phy¬ 


logenetic relationships among metazoans. Ri¬ 
bosomal genes play a fundamental role in the 
synthesis of proteins in eukaryotic and pro¬ 
karyotic cells. The SSU rRNA locus (in par¬ 
ticular the 18S rRNA gene) has been widely 
used to infer phylogenetic relationships for 
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reasons that have been elaborated elsewhere 
(e.g., Sogin, 1991; Adoutte and Philippe, 
1993). The main rationale is (1) that this 
gene is long enough to provide phylogenetic 
information, (2) it is among the slowest 
evolving sequences found in all living organ¬ 
isms, and (3) different regions of the mole¬ 
cule evolve at different rates. For these rea¬ 
sons the 18S rRNA gene allows the inference 
of phylogenetic history across a broad taxo¬ 
nomic range. Moreover, the presence of 
many copies per genome and their homoge¬ 
nization through concerted evolution (Dover, 
1982; Hillis and Dixon, 1991) greatly reduce 
intraspecific variation and facilitate DNA 
amplification via PCR. 

The SSU rRNA gene of most organisms 
is between 1800 and 1900 bp in total length. 
However, exceptional cases of long SSU 
rRNA genes are not uncommon (see Crease 
and Colbourne, 1998 for some examples), 
with certain regions being particularly prone 
to sequence variability (both in nucleotide 
substitutions and in sequence length). 

In some cases, the SSU rRNA gene con¬ 
tains type I introns that are spliced out of the 
mature rRNA molecule (e.g., De Wachter et 
al., 1992; Wilcox et al., 1992; Bhattacharya 
et al., 1994). However, there are many long 
SSU rRNA genes that do not contain introns 
(e.g., Hinkle et al., 1994). Some of these long 
SSU rRNA genes have been reported for 
metazoans: Acyrthosiphon pisum 2469 bp 
(Kwon et al., 1991); Xenos vesparum 3316 
bp (Chalwatzis et al., 1995); Daphnia pulex 
2293 bp (Crease and Colbourne, 1998); Eu- 
peripatoicles leuckarti 2206 bp (Aguinaldo et 
al., 1997); Echinococcus granulosus 2394 bp 
(Picon et al., 1996). To our knowledge, type 
1 introns have not been found in any meta¬ 
zoan taxon. 

Other unusual phenomena have been re¬ 
ported for some organisms. The presence of 
more than one type of SSU rRNA gene has 
been found in the protozoan Plasmodium 
(Gunderson et al., 1987; Waters et al., 1989; 
Qari et al., 1994). Two types of 18S rRNA 
genes have also been reported in a group of 
metazoans, the freshwater and terrestrial pla- 
narians of the family Dugesiidae (Carranza 
et al., 1996, 1998a, 1998b). 

Because alternate conserved and noncon- 
served regions are present in the SSU rRNA, 


sequence alignments are challenging. Some 
authors have based alignments on informa¬ 
tion derived from secondary structure models 
(see Kjer, 1995). However, different models 
(i.e., Gutell et al., 1985; Hendriks et al., 
1988a, 1988b; Neefs and De Wachter, 1990; 
Van de Peer et al., 1998) can lead to different 
phylogenetic results (Winnepenninckx and 
Backeljau, 1996). Moreover, the variable re¬ 
gions cannot be aligned reliably and thus 
used as phylogenetic information, even if 
secondary structure predicts that a certain 
string of nucleotides is homologous (e.g., a 
variable loop, situated between a conserved 
stem). 

The existence of different models can be 
attributed to problems in inferring secondary 
(and tertiary and quaternary) structures via 
direct observation. This was pointed out by 
Gutell and collaborators, who acknowledged 
that “any rigorous search for a secondary 
structure model for 16-S rRNA would ne¬ 
cessitate use of the comparative method” 
(Gutell et al., 1985: 156). These authors also 
mentioned that the direct approach of X-ray 
crystallography remains only a remote pos¬ 
sibility, because of the difficult procedure for 
preparing high-quality crystals of ribosomes 
or their subunits, together with the added 
problem that tertiary structure may be di¬ 
rected and/or stabilized by quaternary inter¬ 
actions (Gutell et al., 1985). 

Where direct approaches have failed to in¬ 
fer the secondary structure of the SSU rRNA 
gene, some indirect approaches have suc¬ 
ceeded. The comparative method has been 
successful to the point of building a database 
of hundreds of secondary structures of SSU 
rRNA. This database, the SSU ribosomal 
subunit RNA database (http://rrna.uia.ac.be/ 
rrna/ssu [Van de Peer et al., 1998]) is main¬ 
tained and updated constantly, and incorpo¬ 
rates probably the largest comparative mo¬ 
lecular data set. 

In the present study, we report some ex¬ 
traordinary examples showing enormous var¬ 
iation in length among 18S rRNA sequences 
of different metazoan taxa. We also comment 
on the regions of the molecule that concen¬ 
trate the largest variation. Finally we explore 
a novel method that facilitates the use of var¬ 
iable regions of the molecule, which cannot 
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be accommodated into standard phylogenetic 
analyses using sequence alignments. 

Materials and Methods 

Genomic DNA samples were obtained 
from fresh, frozen, or ethanol-preserved tis¬ 
sues in a solution of Guanidinium thiocya¬ 
nate homogenization buffer following a 
modified protocol for RNA extraction 
(Chirgwin et al., 1979). The 18S rDNA locus 
was PCR-amplified in two or three overlap¬ 
ping fragments of about 950, 900, and 850 
bp each, using primer pairs IF—5R, 3F— 
18Sbi, and 5F—9R, respectively. Primers used 
in amplification and sequencing were de¬ 
scribed in Giribet et al. (1996, 1999). Am¬ 
plification was carried out in a 50 |xL volume 
reaction, with 1.25 units of AmpliTaq DNA 
Polymerase (Perkin Elmer), 200 p,M of d- 
NTPs, and 1 pM of each primer. The PCR 
program consisted of a initial denaturing step 
at 94°C for 60 seconds, 35 amplification cy¬ 
cles (94°C for 15 sec, 49°C for 15 sec, 72°C 
for 15 sec), and a final step at 72°C for 6 
minutes in a GeneAmp PCR System 9700 
(Perkin Elmer). 

PCR samples were purified with the GE- 
NECLEAN 111 kit (BIO 101 Inc.) and di¬ 
rectly sequenced using an automated ABI 
Prism 377 DNA sequencer. Cycle-sequenc¬ 
ing with AmpliTaq DNA Polymerase, FS 
(Perkin-Elmer) using dye-labeled terminators 
(ABI PRISM BigDye Terminator Cycle Se¬ 
quencing Ready Reaction Kit) was per¬ 
formed in a GeneAmp PCR System 9700 
(Perkin Elmer). Amplification was carried 
out in a 10 pL volume reaction: 4 pL of 
Terminator Ready Reaction Mix, 10—30 ng/ 
mL of PCR product, 5 pmoles of primer and 
dH 2 0 to 10 pL. The cycle-sequencing pro¬ 
gram consisted of a step at 94°C for 3 min¬ 
utes, 25 sequencing cycles (94°C for 10 sec, 
50°C for 5 sec, 60°C for 4 min) and a rapid 
thermal ramp to 4°C and hold. The BigDye- 
labeled PCR products were isopropanol-pre- 
cipitated following manufacturer protocol. 

Some PCR products that could not be se¬ 
quenced directly were purified and ligated 
into pUC 18 Sma I/BAP dephosphorylated 
vector using the SureClone Ligation Kit 
(Pharmacia P-L Biochemicals) as described 
in Giribet et al. (1996). Sequencing was then 


performed by the dideoxy termination meth¬ 
od (Sanger et al., 1977) using T7 DNA poly¬ 
merase (Sequencing Kit from Pharmacia 
Biotech). 

All sequences have been deposited in 
GenBank (see taxonomy, sequence length, 
and accession codes in table 1). The new se¬ 
quences have been compared to other pub¬ 
lished sequences available from GenBank. 
The terminology used for the secondary 
structure topology follows the nomenclature 
of Van de Peer et al. (1998). 

RESULTS AND DISCUSSION 

Extraordinary 18S rRNA Sequences of 
Metazoans 

After studying the 18S rRNA gene of 
about 400 metazoan taxa (180 of these se¬ 
quences collected by the authors; table 1), we 
have observed that several animal taxa pos¬ 
sess large insertions at different regions of 
the molecule. This variation is shown in fig¬ 
ure 1 and table 1. For example, within the 
phylum Mollusca, the cephalopods Loligo 
pealei (squid). Sepia elegans (cuttlefish), and 
Nautilus scrobiculatus present large inser¬ 
tions in regions V2, V4, V7, and V9. Other 
groups of molluscs such as the anomalodes- 
matan bivalves present large insertions in re¬ 
gion V7, and some Archaeogastropoda have 
insertions in regions V2 and V4. But inser¬ 
tions are not restricted to the phylum Mol¬ 
lusca. Sea cucumbers (Echinodermata, Hol- 
othuroidea) and arrow-worms (Chaetogna- 
tha) present insertions in region V4; some 
leeches (Annelida, Hirudinea) present inser¬ 
tions in region V7; some parasitic planarians 
(Platyhelminthes) present insertions both in 
regions V4 and V7; and velvet worms (On- 
ychophora) present insertions in regions V2, 
11, E23-7, V7, and V9. Within the arthro¬ 
pods, there are many extraordinary cases 
among insects (see the reported strepsipteran 
sequences by Chalwatzis et al., 1995; 1996; 
Whiting et al., 1997) and certain crustacean 
groups. But the most bizarre case within ar¬ 
thropods (and perhaps for the entire Meta¬ 
zoa) are myriapods. 

The 18S rRNA Gene of the Myriapods 

Myriapods comprise four groups of terres¬ 
trial arthropods. Centipedes (class Chilopo- 


4 


AMERICAN MUSEUM NOVITATES 


NO. 3336 


TABLE 1 

List of the 180 Species of 18S rRNA Sequences Generated by the Authors 

(with GenBank accession codes and sequence length [excluding a total of 46 bp from the external 
primers IF and 9R]; asterisks refer to noncomplete sequences, and thus the length is not reported) 



GenBank 

bp 


GenBank 

bp 

Annelida (Polychaeta) (3 sp.) 


Lima lima 

AF120533 

1771 

Eunice torquata 

AF123304 

1768 

Limaria hians 

AF120534 

1767 

Dinophilus gyrociliatus 

AF119074 

1784 

Anomia ephippium 

AF120535 

1763 

Myzostoma sp. 

AF123305 

1770 

Psilunnio littoralis 

AF120536 

1766 

Phylum Sipuncula (3 sp.) 



Lampsilis cardium 

AF120537 

1765 

Aspidosiphon misakiensis 

AF119090 

1766 

Neotrigonia bednalli 

AF120538 

1765 

Themiste alutacea 

AF119075 

1757 

Pandora sp. 

AF120539 

2143 

Phascolopsis gouldii 

AF123306 

1765 

Lyonsia hyalina 

AF120540 

1986 

Phylum Echiura (2 sp.) 

Bonellia viridis 

AF123307 

1787 

Cuspidaria cuspidata 
Cardiomya costellata 

AF120541-2* 

AF120543 

1804 

AF119076 

1772 

My oner a sp. 

AF 120544 

1884 

Urechis sp. 

Chama gryphoides 

AF120545* 

Phylum Nemertea (2 sp.) 



Codakia cfr. orbiculata 

AF120546* 


Prostoma eilhardi 

U29494 

1790 

Galeomma turtoni 

AF 120547 

1775 

Amphiporus sp. 

AF119077 

1778 

Lasaea sp. 

AF 120548 

1774 

Phylum Mollusca (61 sp.) 



Cardita calyculata 

AF120549 

1727 

Polyplacophora (2 sp.) 



Cardites antiquata 

AF120550 

1775 

Lepidopleurus cajetanus 

AF120502 

1761 

Astarte castanea 

AF120551 

1775 

Acanthochitona sp. 

AF120503 

1763 

Dreissena polymorpha 

AF120552 

1782 

Cephalopoda (3 sp.) 



Parvicardium exiguum 

AF120553* 


Nautilus scrobiculatus 

AF120504 

2485 

Abra sp. 

AF120554 

1770 

Loligo pealei 

AF120505 

2221 

Ensis ensis 

AF! 20555 

1765 

Sepia elegans 

AF120506-7* 


Calyptogena magnifica 

AF120556 

1777 

Gastropoda (14 sp.) 



Corbicula fluminea 

AF120557 

1777 

Cocculina messingi 

AF120508 

1775 

Sphaerium striatinum 

AF120558 

1781 

Entemotrochus adansonianus 

AF120509 

1991 

Mercenaria mercenaria 

AF120559 

1779 

Perotrochus midas 

AF120510 

1986 

My a arenaria 

AF120560 

1783 

Haliotis tuberculata 

AF120511 

1809 

Varicorbula dissimilis 

AF120561 

1795 

Sinezona confusa 

AF120512 

1810 

Gastrochaena dubia 

AF120562 

1777 

Diodora graeca 

AF120513 

1855 

Hiatella arc tic a 

AF 120563 

1774 

Clanculus cruciatus 

AF120514 

1733 

Bankia carinata 

AF 120564 

1791 

Theodoxus fluviatilis 

AF120515 

1767 

Phylum Brachiopoda (1 sp.) 


Viviparus georgianus 
Truncatella guerinii 
Truncatella sp. 2 

AF120516 

AF120517 

AF120518 

1795 

1834 

1827 

Argyrotheca cordata 

Phylum Phoronida (2 sp.) 

Phoronis australis 
Phoronopsis viridis 

AF119078 

AF119079 

1762 

1767 

1765 

Balds eburnea 

AF120519 

1758 

AF123308 

Rissoella caribea 

AF120520 

2239 

Discodoris atromaculata 

AF120521 

1858 

Phylum Bryozoa (3 sp.) 



Scaphopoda (2 sp.) 



Liehenopora sp. 

AF 119080 

1785 

Dentalium pilsbryi 

AF120522 

1804 

Membranipora sp. 

AF119081 

1761 

Rhabdus rectius 

AF120523 

1810 

Caberea boryi 

AF 119082 

1772 

Bivalvia (40 sp.) 



Nemertodermatida (1 sp.) 



Solemya velum 

AF120524 

1771 

Meara stichopi 

AF119085 

1768 

Nucula sulcata 

AF120525 

1765 

Phylum Priapula (1 sp.) 



Nucula proximo 

AF120526 

1766 

Tubiluchus corallicola 

AF119086 

1768 

Acila castrensis 

AF120527 

1765 

Phylum Onycophora (2 sp.) 



Yoldia limatula 

AF120528 

1767 

Peripatopsis capensis 

AF 119087 

2174 

Nuculana minuta 

AF120529 

1770 

Epiperipatus biolleyi 

AFXXXXXX* 


Lithophaga lithophaga 

AF120530 

1767 



Striarca lactea 

AF120531 

1765 

Phylum Tardigrada (1 sp.) 



Pteria hirundo 

AF120532 

1775 

Macrobiotus hufelandi 

X81442 

1762 
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TABLE 1—( Continued ) 


GenBank bp 


Phylum Arlhropoda 
Chelicerata (49 sp.) 


Achelia echinata 

AF005438 

1795 

Callipallene sp. 

AF005439 

1787 

Endeis laevis 

AF005441 

1791 

Colossendeis sp. 

AF005440 

1793 

Limulus polyphemus 

U91490 

1759 

Carcinoscorpius rotundicaudatus 

U91491 

1759 

Belisarius xambeui 

AF005442 

1761 

Pseudocellus pearsei 

U91489 

1763 

Ricinoididae sp. 

AF124930 

1757 

Gluvia dorsalis 

AF007103 

1761 

Eusimonia wunderlichi 

U29492 

1762 

Chanbria regalis 

AF124931 

1763 

Stenochrus portoricensis 

AF005444 

1759 

Trithyreus pentapeltis 

AF124932 

1761 

Mastigoproctus giganteus 

AF005446 

1760 

Paraphrynus sp. 

AF005445 

1760 

Amblypygidae sp. 

AF124933 

1759 

Liphistius bicoloripes 

AF007104 

1762 

Nesticus celullanus 

AF005447 

1763 

Roncus cfr. pugnax 

AF005443 

1761 

Americhernes sp. 

AF124934 

1762 

Opilioacarus texanus 

AF124935 

1763 

Siro rubens 

U36998 

1763 

Parasiro coiffaiti 

U36999 

1761 

Stylocellus n.sp. 

U91485 

1763 

Dalquestia formosa 

AF124936 

1761 

Odiellus troguloides 

X81441 

1759 

Opilio parietinus 

AF124938 

1761 

Astrobunus grallator 

AF124939 

1761 

Nelima sylvatica 

U9I486 

1762 

Leiobunum sp. 

AF124940 

1761 

Hadrobunus cfr. maculosus 

AF124941 

1761 

Caddo agilis 

U91487 

1766 

Ischyropsalis luteipes 

U37000 

1758 

Hesperonemastoma modestum 

AF124942 

1762 

Sabacon cavicolens 

AF124944 

1762 

Dicranolasma soerenseni 

U37001 

1756 

Centetostoma dubium 

U37002 

1758 

Nemastoma himaculatum 

AF 124947 

1758 

Equitius doriae 

U37003 

1762 

Triaenobunus sp. 

AF124950 

1763 

Zuma acuta 

AF124951 

1762 

Oncopus cfr. alticeps 

U91488 

1762 

Scotolemon lespesi 

U37005 

1760 

Maiorerus randoi 

U37004 

1763 

Bishopella laciniosa 

AF 124952 

1763 

Gnidia holnbergii 

U37006 

1760 

Pachyloides thorellii 

U37007 

1761 

Hoplobunus sp. 

AF 124953 

1762 



GenBank 

bp 

Hexapoda (11 sp.) 

Podura aquatica 

AF005452 

1761 

Acerentulus traeghardi 

AF005453 

1955 

Campodea tillyardi 

AF173234 

1851 

Campodeidae sp. 

AF005455* 


Catajapyx sp. 

AF005456* 


Dilta littoralis 

AF005457 

1792 

Machiloides sp. 

AFXXXXXX 

1790 

Lepisma sp. 

AF005458 

1785 

Thermobius sp. 

AFXXXXXX 

1788 

Tricholepidion gertschi 

AFXXXXXX 

1809 

Myriapoda (38 sp.) 

Cylindroiulus punctatus 

AF005448 

1785 

Polydesmus coriaceus 

AF005449 

1783 

Scutigerella spl 

AF007106 

1299 

Scutigerella sp.2 

AF005450* 


Hanseniella sp. 

AF173237* 


Pauropodidae sp. 

AF005451 

2182 

Scutigera cokoptrata 

AF000772 

1819 

Thereuopoda clunifera 

AF119088 

1817 

Allothereua maculata 

AF173240* 


Lithobius variegatus 

AF000773 

1814 

Australobius scabrior 

AF173241 

1815 

Paralamyctes n.sp. 

AF173242 

1818 

Lamyctes emarginatus 

AF173244 

2099 

Henicops maculatus 

AF 173245 

2231 

Anopsobius n. sp. 

AF173247 

1944 

Craterostigmus tasmanianus 

AF000774 

1814 

Scolopendra cingulata 

U29493 

1841 

Cormocephalus monteithi 

AF173249 

1842 

Ethmostigmus rubripes 

AF173250 

1844 

Alipes sp. 

AFI73251 

1844 

Rhysida nuda 

AF173252 

1846 

Cryptops trisulcatus 

AF000775 

1819 

Theatops erythrocephala 

AF000776 

1818 

Scolopocryptops nigridus 

AF173253 

1817 

Mecistocephalus sp. 

AF173254 

1820 

Pseudohimantarium 

mediterraneum 

AF000778 

2157 

Henia (Chaetechelyne) 
vesuviana 

AF173255 

2194 

Pectiniunguis argentinensis 

AF173256 

2006 

Schendylops pampeanus 

AF173257 

2053 

Ballophilus australiae 

AF173258 

2108 

Clinopodes cfr. poseidonis 

AF000777 

2224 

Tasmanophilus sp. 

AF173259 

1930 

Tuoha sydneyensis 

AF173260 

2083 

Zelanion antipodus 

AF173261 

2194 

Zelanion sp. 

AF173262 

2224 

Ribautia n. sp. 

AF173263 

2218 

Aphilodon weberi 

AF173264 

2015 

Strigamia maritima 

AF 173265 

2122 

Phylum Enteropneusta (1 sp.) 


Glossobalanus minutus 

AF119089 

1776 
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Fig. 1. Schematic representation of the 18S rRNA locus. The gray squares represent the variable 
regions V2, V4, V7, and V9 with insertions (V2: Onychophora, Geophilomorpha, Cephalopoda, Ar- 
chaeogastropoda; V4: Hexapoda, Crustacea, Pauropoda, Holothuroidea, Chaetognatha, Platyhelminthes, 
Cephalopoda; V7: Onychophora, Hexapoda, Crustacea, Pauropoda, Chilopoda, Platyhelminthes, Hiru- 
dinea. Cephalopoda, Gastropoda; V9: Onychophora, Crustacea, Cephalopoda). The black arrowheads 
represent particular insertions (10: Pauropoda; 11: Onychophora; E23-7: Onychophora and Pauropoda; 
E23—8: Pauropoda; 29: Pauropoda; 46: Protura). The black bar represents the 500 bp deletion of the 
Symphyla. 


da) and millipedes (class Diplopoda) are the 
two principal classes of myriapods. The two 
other classes of myriapods lack common 
names: Symphyla and Pauropoda. Prior to 
the analysis of Edgecombe et al. (1999), the 
only complete 18S rRNA sequence data for 
myriapods available at GenBank were eight 
centipedes and two millipedes (Giribet et al., 
1996, 1999; Giribet and Ribera, 1998). Two 
centipede species of the order Geophilomor¬ 
pha ( Clinopodes poseidonis and Pseudohi- 
mantarium mediterranean!) present an inser¬ 
tion of about 300 bp at region V7, whereas 
all the other available sequences are fairly 
conserved in terms of primary sequence. 
However, a wider ongoing study on the 18S 
rRNA gene of myriapods suggests that they 
constitute one of the most interesting cases 
of 18S rRNA variation in any metazoan 
group. 

Within the centipedes, the members of the 
order Geophilomorpha (15 species studied 
belonging to 9 families), excluding two spe¬ 
cies of the most basal family Mecistocephal- 
idae (Edgecombe et al., 1999), exhibit inser¬ 
tions of about 300 bp in the region V7. This 
is an unusual example that shows exactly 
when the insertion occurred during the phy¬ 
logenetic process, and illustrates the putative 
information of such insertions (fig. 2). 

Within the millipedes, members of the 
family Polyzonidae display sequences longer 


than 2700 bp. Data for one species of Pau¬ 
ropoda show that the 18S rRNA is ca. 2200 
bp, with several small insertions (Giribet, 
1997; Giribet and Ribera, 2000). But perhaps 
the most unusual case among metazoan 18S 
rRNA sequences is the Class Symphyla. Am¬ 
plification of the 18S rRNA loci of three spe¬ 
cies belonging to two genera (two species of 
Scutigerella from northeastern Spain and the 
Canary Islands, respectively, and one species 
of Hanseniella from Australia) yielded a 
product band size of about 1350 bp. Se¬ 
quencing this fragment suggests a deletion of 
about 500 bp in the central region of the mol¬ 
ecule. 

Although it might be conjectured that in 
this case a nonfunctional pseudogene has 
been sequenced, as occurred with the 18S 
rRNA locus of the platyhelminth Dugesia 
mediterranea (Carranza et al., 1996) and oth¬ 
er dugesiids (Carranza et al., 1998a, 1998b), 
this seems improbable for several reasons. 
First, this sequence has been obtained from 
three different species and in two indepen¬ 
dent laboratories. Second, none of the highly 
conserved primers from the “deleted” region 
(forward primers 4F, 18Sa0.7, 18Sa0.79, 
18Sal.0; reverse primers 4R, 18Sb5.0, 
18Sb3.9, 18Sb3.0 [Giribet et al., 1996; Whit¬ 
ing et al., 1997]) amplified any DNA frag¬ 
ment when combined with primers from oth¬ 
er regions. If the 1350 bp fragment was a 
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Fig. 2. Phylogenetic tree of the centipedes based on the combined analysis of Edgecombe et al. 
(1999). The arrow indicates where the insertion of ca. 300 bp at region V7 occurred during the evolution 
of centipedes. 


pseudogene, we would expect to amplify 
fragments of the original gene when using 
the conserved primers located within the 
“deleted” region. Third, phylogenetic anal¬ 
yses including the symphylan sequences 
show that symphylans are arthropods related 
to other myriapods (Giribet, 1997; fig. 3). 
Fourth, amplification of DNA from an RNA 
source, as described by Carranza et al. 
(1998a), yielded a product band size of ap¬ 
proximately 1350 bp, as expected. These 
facts demonstrate that a deletion of ca. 500 
bp occurred in the common ancestor of these 
three symphylan species. 

18S rRNA Variation in Metazoans 

It seems that large insertions and deletions 
are not as constrained as was previously 
thought (e.g.. Crease and Colbourne, 1998). 
These events occur in many metazoan taxa, 
and are commonly in regions V2, V4, V7, 
and V9. Other parts of region 23 (that in¬ 
cludes the region V4) are also variable. In a 
phylogenetic study of about 150 arthropod 
18S rRNA sequences (Giribet and Ribera, 
2000), insertions at region E23—7 were ob¬ 
served in Onychophora and in Pauropoda 
while insertions at region E23—8 were ob¬ 
served in the pauropod species. Other inser¬ 


tions observed in particular taxa occur at 
sites 8 (in the millipede Polyzonium), 11 (in 
Onychophora), 29 (in Pauropoda), and 46 (in 
the proturan Acerentulus traeghardi). How¬ 
ever, we only obtained sequences from one 
pauropod and one proturan, hence these re¬ 
sults cannot be generalized to other members 
of such groups. 

Certain taxa present insertions in variable 
regions whereas in the remaining regions the 
primary sequence may be conserved. For ex¬ 
ample, certain geophilomorph centipedes ex¬ 
hibit a small insertion at region V2 (between 
10 and 80 bp compared to other centipedes) 
and a large insertion (about 300 bp) at region 
V7, whereas the remaining positions are con¬ 
served with respect to other centipedes. Oth¬ 
er taxa not only present insertions in the var¬ 
iable regions, but also in the primary se¬ 
quence. This is the case in the cephalopods, 
which have insertions in regions V2, V4, V7, 
and V9, and differ considerably from other 
molluscs in the primary sequence of the re¬ 
maining regions. 

Although several metazoans present inser¬ 
tions in the 18S rRNA, reduction of the 18S 
rRNA gene appears to be a rare event in evo¬ 
lution. To our knowledge, there are no other 
reported cases of 18S rDNA sequences with 
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Fig. 3. Phylogenetic tree based on 18S t'RNA sequence data indicating the position of two symphy- 
lans (box) with respect to other myriapods (underlined taxa) in a phylogenetic analysis of arthropods 
(from Giribet, 1997). The two symphylans appear related to other myriapods. 
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a deletion of the magnitude of that observed 
in the symphylans (ca. 500 bp). The deletion 
corresponds to the central region of the mol¬ 
ecule (approximately from region 14 to E23— 
9). The remaining primary sequence is fairly 
conserved, which we assume may be func¬ 
tional. However reconstruction of a global 
secondary structure cannot be conducted us¬ 
ing the comparative method. Another case of 
sequence length reduction in the 18S rRNA 
locus occurs in the Dicyemid mesozoans 
(three species of the genus Dicyema ), with a 
total gene sequence of about 1670 bp that 
presents two major deletions at the variable 
sites V2 and V9. 

The geophilomorphan centipedes that pre¬ 
sent the insertion of about 300 bp at region 
V7 (13 species) also display a large insertion 
(about 300 bp) at the D3 expansion fragment 
of the large subunit rRNA locus (28S rRNA). 
Neither of the insertions at region V7 of the 
18S rRNA or at the D3 expansion fragment 
of the 28S rRNA locus have been found in 
the putative most basal geophilomorph fam¬ 
ily Mecistocephalidae (Giribet et al., 1999; 
Edgecombe et al., 1999). This apparent cor¬ 
relation between the insertions at region V7 
of the 18S rRNA locus and at the D3 expan¬ 
sion fragment of the 28S rRNA locus (also 
observed in the cephalopod Loligo) could 
suggest a possible interaction between these 
subunits in the ribosome. 

18S rRNA Variable Sites and Phylogenetic 
Analyses 

The presence of certain variable sites in 
the 18S rRNA molecule can hinder phylo¬ 
genetic analyses at the alignment step, a fact 
that is promoted by several researchers who 
avoid the use of ribosomal genes for drawing 
inferences (e.g., Ayala et al., 1998). In gen¬ 
eral, researchers using ribosomal genes ex¬ 
clude the variable regions from their phylo¬ 
genetic inference step because of uncertain¬ 
ties in alignments (e.g., Giribet et al., 1996, 
1999). But since data removal from phylo¬ 
genetic analyses is also problematic (see Ga- 
tesy et al., 1993), and automatic alignments 
are explicit, other researchers prefer the use 
of automatic alignments exploring different 
cost matrices (e.g., Wheeler, 1995). Regard¬ 
less of which of these options is the best 


(philosophically, computationally, or practi¬ 
cally), the extreme length variation of some 
of the SSU rRNA helices may constitute a 
serious problem in the phylogenetic analysis 
of certain data sets. 

A clear example is illustrated in figure 4, 
where we show the variable region V7 of 17 
centipede taxa (see table 2 for the taxono¬ 
my). The fragment as a whole can be con¬ 
sidered homologous because it is located be¬ 
tween two conserved regions that constitute 
a stem. The most common sequence length 
for the V7 region in centipedes ranges be¬ 
tween 65 and 70 bp (character state present 
in the five recognized orders of centipedes). 
However three taxa ( Scolopendra, Ethmos- 
tigmus, and Alipes; family Scolopendridae) 
display sequences between 93 and 94 bp, and 
four taxa ( Pseudohimantarium, Henia, Cli- 
nopodes, and Zelanion', belonging to four 
families of Geophilomorpha) exhibit se¬ 
quences between 354 and 384 bp. These two 
clades are well defined morphologically and 
could also be characterized in terms of se¬ 
quence length or perhaps secondary structure 
topology. However, a standard phylogenetic 
analysis of this fragment cannot be conduct¬ 
ed successfully when base-to-base homology 
is required, as is the case with multiple se¬ 
quence alignments. This situation is frustrat¬ 
ing since the sequence data clearly display 
historical information that cannot be used 
phylogenetically. 

A new method to analyze DNA sequence 
data that does not require base-to-base cor¬ 
respondences was recently developed by 
Wheeler (1999) and is discussed in greater 
detail there. Briefly, the method, named 
“fixed character states”, optimizes DNA se¬ 
quence data without employing multiple se¬ 
quence alignments by treating entire homol¬ 
ogous stretches of sequence data as charac¬ 
ters. The set of specific sequences exhibited 
by the terminal taxa constitutes the character 
states. Thus the number of states is equal to 
the number of unique sequences (or homol¬ 
ogous fragments) exhibited by the data. In 
the example illustrated here, there is one 
character (region V7) with 17 states (as many 
as different taxa). Other situations could arise 
where the number of states would be smaller 
than the number of taxa if two or more were 
to share identical sequences. The salient fea- 
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Scutigera 

1 ACGATCGATT TAGGCGAGCT GTTTCCTTCC CCCGNGGNTG GAGCGGCACT GCCTCCGTCG GTCGACAA 

Thereuopoda 

1 ACGATCGATT TGGGCGAGCT GTTTCCTGCC TTCACGGTAG GAGCGGCACT GCCTCCGTTG GTCGATAA 


Lithobius 

1 ACGACTGATC CCGGGGTGCC GGGCCCTCTT CGNGGGGGAA CGGTGTTGCC TCCGTCAGTT GTTCG 

Australobius 

1 ACGACCGATC CCGGGGTGCC GGTGCCCTCT TCGGGGGGAA CGGTGTTGCC TCTGTCGGTT GATCG 

Craterostigmus 

I ACGACCGATC CCGGGGTGCC GTCTCCTTCC TCGTGATGGA GCGGCGTTAC CTCCGTCGGC CGATCG 


Scolopendra 

1 ACGTCCGATC CTGGGGTGCC GGTGCCTCCT AAACCTCCGC TCTTTCGAAA AGAGTGGGAG GCGGGGGAAC GGCTTTGCCT CTGTCGGACG ATTG 

Ethmos t igmus 

1 ACGTCCGATC TCGGGGTGCC GGTGCCTCCT AAACCCCCCC TTCTTCGATG GAGCGGGGGG CGGGGGAACG GCTTTGCCTC TGTCGGATGA TCG 


Alipes 

1 ACATTCGATC TCGGGGTGCT GGCACCTCCT ACACCCCCCC TTCTTTCATG GAGTGGGGGG CGGGGGAACA GCTTTGCCTC TGTCGGATGA TCG 


Cryptops 

1 ACGTCCGATC TCGAGGTGCC GTTACCTTtC TCCTCGTGAG GGGTtCGGCT TTGCCTCTGT CCGACGATTA 


Theatops 

1 ACGTCCGATC TCGGGGTGCC GTCTC CTTCT CCTCGMGAGA GGTGCGGCTT TGCCTCTGTC GGACGATTG 


Scolopocryptops 

1 ACGTCCGATC TCGGGGTGCC GTTGCCATCT CCTCGTGAGG GGCTCGGCCT GACCTCTGTT GGACGATTG 


Mecistocephalus 

1 ACGACTGGTC TTGGGGTGCT GGTTCTATTC CTTCATGGGT AGCCAGCTTT TGCCTCCGTC GGTCGATTT 


Nodocephalus 

1 ACGACTGATC TCGGGGTGCT GGATCTATTC CTTCGTGGAT AGCCGGTAGT TGCCTCTGTT CGTCGTCCGA 


Pseudohimantarium 

1 ACGACTGATC CCGGGGTGCC GGTGTCCCCC CTTCTGTCGC TTTAATTTTT TGTCTGCGGC ATGTTGCCGT TTGCTTTCTT GGGTGTATCT TGCTGATCCC 
101 YTACTCATGT TTTCTATCAC CTCCCCACTC ATCGAGTGTT TTGCGGCTGG TTTCTGCCTC TGGTCGGTAT TGCATTTACG CCRTCGCGGG TATCGTGCGT 
201 ATCGGTCATG TGGTTGCCTC GTTGCTTTCG CTCTGTGTTG TGGTGTGTGT GTGTGTGGGG CGTGTTGAGG GCAAAGGCAT TATGATTCTC GAGAGGAGTA 
301 GCGGTCTTGC GGGACATTAA TGGGCGGTCA GGCTAGGTGG GGTCGCACAC GGCGTTGCYT CTGTCAGTCG ACAGG 


Henia 

1 ACGACCGATC CCGGGGTGCC GGTGCTCCCA TCTTGCTTCT GTTTGTCCGT TTTTCGCTCA GGCGACTCTC GCGTCTCGCC TCTCTCTGCG GACGTTCGGT 
101 TCAACTCGGC GCGGCAGAGG CCACGCTCTC GACTTCCCCT CGTGTTTCGG CGGTTGCGTC GAGGTTGTCT CCGTGTCTCC CTTTCTCGGA AGGGCTGCGC 
201 GGCGGTCGGC CTTCGTCGTC ATCGTTCTGG CCGGGGGTGC ATGTTCGCGG GCGGGGATGA AACCGCTTCG AGCGCCGAGC GTCGGGAGTC CGGGCGTTTC 
301 TGCAGATAGC CGAAAGCGGG TTCGGGCGGC AGTTGTCGTT AGCATGGGAA AGCACCCGGC AGTTGCCTCC GTCGGTCGAA TACG 


Clinopodes 

1 ACGACCGATC CCGGGGTGCC GGCGCTCCCA TCTWGMTTCT GTTTGTTTCT GTCCGGCGYT GAGACGCTCT TTGTGGCTGC TGCGATTTTC TTGCTCCTCC 
101 GGGTTTTTTC TWTCNCCYGC CGGGCGGGAR AAAGAARGGT CTGATGCGTG GGGTGTGTGT GTGAAGCTGT CTTTYTCTCT CTTTCGAGGG GGGAGAGGCT 
201 GTCTTTTTCT CGCTCGCTCT AAGCTCTTTC CCYTTTCTTT CTTTCCGTTT ACGGAGGTGA TKGATATTTT CTCGGCTCTC GAGCGGGTCT TTCAAGGCGC 
301 ACGAATCGTC SACTGCGTTG GTCGAGACGG CAGTTGTCGT TAGCATGGGT CAGCGTGTGG CGTTGCCTCT GTCGGTCGAW TACG 


Zelanion 

1 ACGACCGATC CCGGGGTGCC GGCGCCTTCT CGTCTTGCGG CCGTTTCGCT CCCCTCTCGG GTTTCGCGTC GTTCTCTTTT CCCTTCCTCA CATCCGTCGC 

101 GGGGAGGCGG TGCGTGCGTT TCACTCGCGG TAGATGTCGC GGAGTTGGCG GAATAAGGCT GCCGCTCTCC CTTTCGGGGG TTGAGTGGGG TCCCGTCGCT 

201 CCGTTTCACT CTCGGGCGTA CGCGCGCATT CGTTTYTCCC TTCTTCGGGG ACTGGGGCTC GAGGGATCGA GGCGCGCGGG ACTTTGCTGA GGGACCGGCG 

301 ACGGTTGTCA TTTGCGCGAG TCGGCTGTCC GGCGTTGCCT CTGTCGGTCG ATCG 

Fig. 4. Variable region (V7) of the 18S rRNA locus of 17 species of centipedes. 


ture of this method is the treatment of length 
variation. Since the sequence variation is ex¬ 
pressed through a series of transformations 
between states, indel or “gap” variation only 
occurs as transformations between sequenc¬ 
es, not globally among all the sequence data. 
This has the effect of moderating the diffi¬ 
culties presented by extreme nucleotide 
length variation at the expense of treating 


strings of bases as character states instead of 
individual nucleotides. 

A matrix of transformation costs is created 
to relate the states to one another. The cells 
of this matrix are defined as the minimum 
transformation cost required between each 
pair of states based on insertion-deletion and 
base substitution costs (as in the calculation 
of an alignment score). The next operation 
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TABLE 2 

Taxonomy of the Centipede Species 
(Chilopoda) Represented in Figures 4 and 5 


Order Scutigeromorpha 

F. Scutigeridae 

Scutigera coleoptrata 
Thereuopoda clunifera 

Order Lithobiomorpha 

F. Lithobiidae 

Lithobius variegatus 
Australobius scahrior 

Order Craterostigmomorpha 

F. Craterostigmidae 

Craterostigmus tasmanianus 

Order Scolopendromorpha 

F. Scolopendridae 

Scolopendra cingulata 
Ethmostigmus rubripes 

Alipes crotalus 

F. Cryptopidae 

Cryptops trisulcatus 

Theatops erythrocephala 
Scolopocryptops nigridus 

Order Geophilomorpha 

F. Mecistocephalidae 

Mecistocephalus sp. 
Nodocephalus doii 

F. Himantariidae 

Pseudohimantarium 

mediterraneum 

F. Dignathodontidae 

Henia (Chaetechelyne) 
vesuviana 

F. Geophilidae 

Clinopodes cfr. poseidonis 

F. Chilenophilidae 

Zelanion antipodus 


uses this transformation matrix to diagnose a 
specific phylogenetic topology by means of 
existing dynamic programming techniques 
(Sankoff and Rousseau, 1975) with the num¬ 
ber of states greatly expanded. This method 
has been implemented in the computer pro¬ 
gram POY (Gladstein and Wheeler, 1997) 
specifying the option -fixedstates (available 
via anonymous ftp at the site ftp.amnh.org / 
pub/molecular/poy/). 

To illustrate how the method works em¬ 
pirically, we have analyzed the 17 sequences 
presented in figure 4 (and table 2) using the 
fixed character states method. The tree ob¬ 
tained (fig. 5) shows some lack of resolution, 
but also shows certain clades highly consis¬ 
tent with the current morphology of the 
group, as well as with the molecular analyses 
of Giribet et al. (1999) and Edgecombe et al. 
(1999). Scolopendromorpha is recognized as 
a clade that in turn includes a monophyletic 
clade, the family Scolopendridae, presenting 
an insertion of about 25 bp. Another clade is 


Scutigera 
Thereuopoda 
Craterostigmus 
Australobius 
Lithobius 
Mecistocephalus 
Nodocephalus 
Zelanion 

Pseudohimantarium 
Henia 
Clinopodes 
Cryptops 
Theatops 
Scolopocryptops 
Scolopendra 
Ethmostigmus 
Alipes 

Fig. 5. Phylogenetic analysis of the data from 
fig. 4 using the “fixed character states” method 
of Wheeler (1999) implemented in the computer 
program POY (Gladstein and Wheeler, 1997). 
Commands: poy -fixedstates -noleading -noran- 
domizeoutgroup -gap 1 -maxtrees 20 -multibuild 
10 -seed—1 -slop 2 -checkslop 5. The two circles 
illustrate the insertions of the Geophilomorpha 
(ca. 300 bp), and the Scolopendridae (ca. 25 bp). 


defined by the insertion of about 300 bp that 
groups all geophilomorph species except the 
mecistocephalids (Mecistocephalus and No¬ 
docephalus, the most basal group that lacks 
the insertion). This is encouraging consider¬ 
ing that this topology corresponds to the 
analysis of just a few bases from the variable 
region V7. Thus, this method facilitates use 
of all the information (variable and con¬ 
served) from ribosomal genes. 

CONCLUSIONS 

Long SSU rRNA appears to be more com¬ 
mon than claimed by some authors (e.g.. 
Crease and Colbourne, 1998) based on its oc¬ 
currence in at least seven metazoan phyla: 
Platyhelminthes, Mollusca, Onychophora, 
Arthropoda, Chaetognatha, Echinodermata, 
and Mesozoa. However, large deletions ap¬ 
pear to be rare, having so far only been found 
in one group of arthropods (Symphyla) and 
in mesozoans. Probably many more taxa dis- 












300 bp 











25 bp 






































12 


AMERICAN MUSEUM NOVITATES 


NO. 3336 


play extraordinary 18S rRNA genes, but this 
will not be discovered until sampling within 
each phylum increases. Nonetheless, the ex¬ 
istence of variable regions should not dis¬ 
courage the use of ribosomal genes in phy¬ 
logenetic analyses, especially when second¬ 
ary structure predictions are combined with 
novel methods of DNA sequence data anal¬ 
ysis. In this sense, the characterization of 
secondary structural features by means of the 
comparative method, and the use of these 
features (homologous regions) as characters 
with multiple states provides a powerful ap¬ 
proach for the analysis of such data using the 
fixed character states method. Maybe it is at 
such levels that secondary structure infor¬ 
mation can best contribute to phylogenetic 
analyses. 
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